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ABSTRACT 

Context. Peak counts have been shown to be an excellent tool for extracting the non-Gaussian part of the weak lensing signal. 
Recently, we developed a fast stochastic forward model to predict weak-lensing peak counts. Our model is able to reconstruct the 
underlying distribution of observables for analysis. 

Aims. In this work, we explore and compare various strategies for constraining a parameter using our model, focusing on the matter 
density O ra and the density fluctuation amplitude <x 8 . 

Methods. First, we examine the impact from the cosmological dependency of covariances (CDC). Second, we perform the analysis 
with the copula likelihood, a technique that makes a weaker assumption than does the Gaussian likelihood. Third, direct, non-analytic 
parameter estimations are applied using the full information of the distribution. Fourth, we obtain constraints with approximate 
Bayesian computation (ABC), an efficient, robust, and likelihood-free algorithm based on accept-reject sampling. 

Results. We find that neglecting the CDC effect enlarges parameter contours by 22% and that the covariance-varying copula likelihood 
is a very good approximation to the true likelihood. The direct techniques work well in spite of noisier contours. Concerning ABC, 
the iterative process converges quickly to a posterior distribution that is in excellent agreement with results from our other analyses. 
The time cost for ABC is reduced by two orders of magnitude. 

Conclusions. The stochastic nature of our weak-lensing peak count model allows us to use various techniques that approach the true 
underlying probability distribution of observables, without making simplifying assumptions. Our work can be generalized to other 
observables where forward simulations provide samples of the underlying distribution. 

Key words. Gravitational lensing: weak, Cosmology: large-scale structure of Universe, Methods: statistical 


1. Introduction 


Weak lensing (WL) is a gravitational deflection effect of light 
by matter inhomogeneities in the Universe that causes distortion 
of source galaxy images. This distortion corresponds to the in¬ 
tegrated deflection along the line of sight, and its measurement 
probes the high-mass regions of the Universe. These regions con¬ 
tain structures that formed during the late-time evolution of the 
Universe, which depends on cosmological parameters, such as 
the matter density parameter G m , the matter density fluctuation 
<rg, and the equation of state of dark energy w. Ongoing and 
future surveys such as KiDS #*, DES t2 , HSC ' 3 , WFIRST :4 , 
Euclid 45 , and LSST :6 are expected to provide tight constraints 
on those and other cosmological parameters and to distinguish 
between different cosmological models, using weak lensing as a 
major probe. 

Lensing signals can be extracted in several ways. A com¬ 
mon observable is the cosmic shear two-point-correlation func¬ 
tion (2PCF), which has been used to constrain cosmological pa¬ 
rameters in many studies, including recent ones (Kilbinger et al. 


1.1 http: //kids. strw. leidenuniv. nl/ 

1.2 http: //www. darkenergysurvey. org/ 

t* 3 http://www.naoj .org/Projects/HSC/HSCProject.html 

1.4 http://wfirst.gsfc.nasa.gov/ 

1.5 http://www.euclid-ec.org/ 

36 http://www.lsst.org/lsst/ 


2013; Jee et al. 2013). However, the 2PCF only retains Gaussian- 
ity, and it misses the rich nonlinear information of the structure 
evolution encoded on small scales. To compensate for this draw¬ 
back, several non-Gaussian statistics have been proposed, for 
example higher order moments (Kilbinger & Schneider 2005; 
Semboloni et al. 2011; Fu et al. 2014; Simon et al. 2015), the 
three-point correlation function (Schneider & Lombardi 2003; 
Takada & Jain 2003; Scoccimarro et al. 2004), Minkowski func¬ 
tionals (Petri et al. 2015), or peak statistics, which is the aim of 
this series of papers. Some more general work comparing differ¬ 
ent strategies to extract non-Gaussian information can be found 
in the literature (Pires et al. 2009; Berge et al. 2010; Pires et al. 
2012 ). 

Peaks, defined as local maxima of the lensing signal, are 
direct tracers of high-mass regions in the large-scale structure 
of the Universe. In the medium and high signal-to-noise (S/N) 
regime, the peak function (the number of peaks as function of 
S/N) is not dominated by shape noise, and this permits one to 
study the cosmological dependency of the peak number counts 
(Jain & Van Waerbeke 2000). Various aspects of peak statis¬ 
tic have been investiagated in the past: the physical origin of 
peaks (Hamana et al. 2004; Yang et al. 201 1), projection effects 
(Marian et al. 2010), the optimal combination of angular scales 
(Kratochvil et al. 2010; Marian et al. 2012), redshift tomography 
(Hennawi & Spergel 2005), cosmological parameter constraints 
(Dietrich & Hartlap 2010; Liu et al. 2014), detecting primordial 
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non-Gaussianity (Maturi et al. 2011; Marian et al. 2011), peak 
statistics beyond the abundance (Marian et al. 2013), the impact 
from baryons (Yang et al. 2013; Osato et al. 2015), magnifica¬ 
tion bias (Liu et al. 2014), and shape measurement errors (Bard 
et al. 2013). Recent studies by Liu, Petri, Haiman et al. (2015a, 
hereafter LPH15), Liu, Pan, Li et al. (2015b, hereafter LPL15), 
and Hamana et al. (2015) have applied likelihood estimation for 
WL peaks on real data and have shown that the results agree with 
the current ACDM scenario. 

Modeling number counts is a challenge for peak studies. To 
date, there have been three main approaches. The first one is to 
count peaks from a large number of A'-body simulations (Di¬ 
etrich & Hartlap 2010; LPH15), which directly emulate struc¬ 
ture formation by numerical implementation of the correspond¬ 
ing physical laws. The second family consists of analytic pre¬ 
dictions (Maturi et al. 2010; Fan et al. 2010) based on Gaussian 
random field theory. A third approach has been introduced by 
Lin & Kilbinger (2015, hereafter Paper I): Similar to Kruse & 
Schneider (1999) and Kainulainen & Marra (2009, 2011a,b), we 
propose a stochastic process to predict peak counts by simulat¬ 
ing lensing maps from a halo distribution drawn from the mass 
function. 

Our model possesses several advantages. The first one is flex¬ 
ibility. Observational conditions can easily be modeled and taken 
into account. The same is true for additional features, such as in¬ 
trinsic alignment of galaxies and other observational and astro- 
physical systematics. Second, since our method does not need 
A'-body simulations, the computation time required to calculate 
the model are orders of magnitudes faster, and we can explore 
a large parameter space. Third, our model explores the under¬ 
lying probability density function (PDF) of the observables. All 
statistical properties of the peak function can be derived directly 
from the model, making various parameter estimation methods 
possible. 

In this paper, we apply several parameter constraint and like¬ 
lihood methods for our peak-count-prediction model from Pa¬ 
per I. Our goal is to study and compare different strategies and 
to make use of the full potential of the fast stochastic forward 
modeling approach. We start with a likelihood function that is 
assumed to be Gaussian in the observables with constant co- 
variance and then compare this to methods that make fewer and 
fewer assumptions, as follows. 

The first extension of the Gaussian likelihood is to take the 
cosmology-dependent covariances (CDC, see Eifler et al. 2009) 
into account. Thanks to the fast performance of our model, it is 
feasible to estimate the covariance matrix for each parameter set. 

The second improvement we adopt is the copula analysis 
(Benabed et al. 2009; Jiang et al. 2009; Takeuchi 2010; Scher- 
rer et al. 2010; Sato et al. 2011) for the Gaussian approxima¬ 
tion. Widely used in finance, the copula transform uses the fact 
that any multivariate distribution can be transformed into a new 
one where the marginal PDF is uniform. Combining successive 
transforms can then give rise to a new distribution where all 
marginals are Gaussian. This makes weaker assumptions about 
the underlying likelihood than the Gaussian hypothesis. 

Third, we directly estimate the full underlying distribution 
information in a non-analytical way. This allows us to strictly 
follow the original definition of the likelihood estimator: the con¬ 
ditional probability of observables for a given parameter set. In 
addition, we compute the p-value from the full PDF. These p- 
values derived for all parameter sets allow for significance tests 
and provide a direct way to construct confidence contours. 

Furthermore, our model makes it possible to dispose of 
a likelihood function altogether, using approximate Bayesian 


computation (ABC, see e.g. Marin et al. 2011) for exploring 
the parameter space. ABC is a powerful constraining technique 
based on accept-reject sampling. Proposed first by Rubin (1984), 
ABC produces the posterior distribution by bypassing the likeli¬ 
hood evaluation, which may be complex and practically unfeasi¬ 
ble in some contexts. The posterior is constructed by comparing 
the sampled result with the observation to decide whether a pro¬ 
posed parameter is accepted. This technique can be improved by 
combining ABC with population Monte Carlo (PMC tl7 , Beau¬ 
mont et al. 2009; Cameron & Pettitt 2012; Weyant et al. 2013). 
Until now, ABC seems to already have various applications in 
biology-related domains (e.g., Beaumont et al. 2009; Berger 
et al. 2010; Csillery et al. 2010; Drovandi & Pettitt 201 1), while 
applications for astronomical purposes are few: morphological 
transformation of galaxies (Cameron & Pettitt 2012), cosmo¬ 
logical parameter inference using type la supernovae (Weyant 
et al. 2013), constraints of the disk formation of the Milky Way 
(Robin et al. 2014), and strong lensing properties of galaxy clus¬ 
ters (Killedar et al. 2015). Very recently, two papers (Ishida et al. 
2015; Akeret et al. 2015) dedicated to ABC in a general cosmo¬ 
logical context have been submitted. 

The paper is organized as follows. In Sect. 2, we briefly re¬ 
view our model introduced in Paper I, the setting for the param¬ 
eter analysis, and the criteria for defining parameter constraints. 
In Sect. 3, we study the impact of the CDC effect. The results 
from the copula likelihood can be found in Sect. 4, and in Sect. 5 
we estimate the true underlying PDF in a non-analytic way and 
show parameters constraints without the Gaussian hypothesis. 
Sect. 6 focuses on the likelihood-free ABC technique, and the 
last section is dedicated to a discussion where we summarize 
this work. 

2. Methodology 

2.1. Our model 

Our peak-count model uses a probabilistic approach that gen¬ 
erates peak catalogs from a given mass function model. This is 
done by generating fast simulations of halos, computing the pro¬ 
jected mass, and simulating lensing maps from which one can 
extract WL peaks. A step-by-step summary is given as follows: 

1. sample halo masses and assign density profiles and positions 
(fast simulations), 

2. compute the projected mass and subtract the mean over the 
field (ray-tracing simulations), 

3. add noise and smooth the map with a kernel, and 

4. select local S/N maxima. 

Here, two assumptions have been made: (1) only bound mat¬ 
ter contributes to number counts and (2) the spatial correlation 
of halos has a small impact on WL peaks. Paper I showed that 
combining both hypotheses gives a good estimation of the peak 
abundance. 

We adopt the same settings as Paper I: the mass function 
model from Jenkins et al. (2001), the truncated Navarro-Frenk- 
White halo profiles (Navarro et al. 1996, 1997), Gaussian shape 
noise, the Gaussian smoothing kernel, and sources at fixed red- 
shift which are distributed on a regular grid. The field of view is 
chosen such that the effective area after cutting off the border is 
25 deg 2 . An exhausted list of parameter values used in this paper 

1,7 This algorithm is called PMC ABC by some and SMC (sequential 
Monte Carlo) ABC by others. 
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Table 1. List of parameter values adopted in this study. 


2.2. Analysis design 


Parameter 

Symbol 

Value 

Lower sampling limit 

- 

10 12 Mo/h 

Upper sampling limit 

- 

10 17 M Q /h 

NFW inner slope 

a 

1 

M-c relation parameter 

Co 

11 

M-c relation parameter 

p 

0.13 

Source redshift 

z s 

1 

Intrinsic ellipticity dispersion 

G 

0.4 

Galaxy number density 

"g 

25 aremin -2 

Pixel size 

^pix 

0.2 aremin 

Kernel size 


1 aremin 

Shape noise 

c p ix 

0.283 

Smoothed noise 

^noise 

0.0226 

Effective field area 

- 

25 deg 2 


1.6 

1.4 

1.2 

1.0 

b°° 0.8 

0.6 

0.4 

0.2 


Fig. 1. Location of 7821 points on which we evaluate the likelihoods. 
In the condensed area, the interval between two grid points is 0.005, 
while in both wing zones it is 0.01. The black star shows (n“,cr‘ 8 n ) = 
(0.28,0.82). 


Explored parameter regions 


0.2 


0.4 


0.6 


0.8 


can be found in Table 1. Readers are encouraged to read Paper I 
for their definitions and for detailed explanations for our model. 

All computations with our model in this study are performed 
by our Camelus algorithm A realization (from a mass func¬ 
tion to a peak catalog) of a 25-deg 2 field costs few seconds to 
generate on a single-CPU computer. The real time cost depends 
of course on input cosmological parameters, but this still gives 
an idea about the speed of our algorithm. 


Throughout this paper, n denotes a parameter set. To simplify the 
study, the dimension of the parameter space is reduced to two: 
7t = (Q m , crg). The other cosmological parameters are fixed, in¬ 
cluding O b = 0.047, h = 0.78, n s = 0.95, and w = -1. The dark 
energy density 0,| e is set to 1 - Q m to match a flat universe. On 
the O m -crg plane, we explore a region where the posterior den¬ 
sity, or probability, is high, see Fig. 1 . We compute the values of 
three different log likelihoods on the grid points of these zones. 
The grid size of the center zone is AO m = A<xg = 0.005, whereas 
it is 0.01 for the rest. This results in a total of 7821 points in the 
parameter space to evaluate. 

For each n. we carry out N = 1000 realizations of a 
25-deg 2 field and determine the associated data vector x tk] = 
(xf\ ..., x (k> ) for all k from 1 to N. These are independent sam¬ 
ples drawn from their underlying PDF of observables for a given 
parameter n. We estimate the model prediction (which is the 
mean), the covariance matrix, and the inverse matrix (Hartlap 
et al. 2007), respectively, by following 


mod 

x i 


Cij 

p 


1 

N 


N 



k=l 


xr4-f Z W‘> - 4”") 

k— 1 

N - d - 2 ~i 
N- 1 


and 


(1) 

( 2 ) 

(3) 


where d denotes the dimension of data vector. This results in a 
total area of 25 000 deg 2 for the mean estimation. 

In this paper, the observation data x obs are identified with a 
realization of our model, which means that x obs is derived by 
a particular realization of x(n m ). The input parameters chosen 
are n in = (f2“,cr“) = (0.28,0.82). The authors would like to 
highlight that the accuracy of the model is not the aim of this 
research work, but precision. Therefore, the input choice and the 
uncertainty of random process should have little impact. 

Peak-count information can be combined into a data vector 
using different ways. Inspired by Dietrich & Hartlap (2010) and 
LPL15, we studied three types of observables. The first is the 
abundance of peaks found in each S/N bin (labeled abd), in other 
words, the binned peak function. The second is the S/N values at 
some given percentiles of the peak cumulative distribution func¬ 
tion (CDF, labeled pet). The third is similar to the second type, 
but without taking peaks below a threshold S/N value (labeled 
cut) into account. Mathematically, the two last types of observ¬ 
ables can be denoted as x,-, thereby satisfying 


Pi = | «peak(v)dv, 


(4) 


where n pea k(V) is the peak PDF function, v m j n a cutoff, and p, a 
given percentile. The observable x abd is used by LPL15, while 
readers find x cut from Dietrich & Hartlap (2010). We would like 
to clarify that using x pcl for analysis could by risky, since this 
includes peaks with negative S/N. From Paper I, we observe 
that although high-peak counts from our model agree well with 
/V-body simulations, predictions for local maxima found in un¬ 
derdensity regions (peaks with S/N < 0) are inaccurate. Thus, 
we include x pcl in this paper only to give an idea about how 
much information we can extract from observables defined by 
percentiles. 

* 8 http://github. com/Linc-tw/camelus 
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Table 2. Definition of jc abd5 , jt pct5 , and x cut5 . Parameters such as v min 
and pi are used by Eq. (4). As an indication, their values for the input 
cosmology n m are also given. They were calculated by averaging over 
1000 realizations. 


Label 
Bins on v 
Xj for 7r ln 

abd5 

[3.0, 3.8[ 
330 

[3.8,4.5[ 
91 

[4.5, 5.3[ 
39 

[5.3, 6.2[ 
18 

[6.2, +oo[ 

15 

Label 

Unin 

Pi 

Xj for 7r ln 

pct5 

—CO 

0.969 

3.5 

0.986 

4.1 

0.994 

4.9 

0.997 

5.7 

0.999 

7.0 

Label 

cut 5 





^min 

3 





Pi 

0.5 

0.776 

0.9 

0.955 

0.98 

Xj for 7T ln 

3.5 

4.1 

4.9 

5.7 

6.7 


Table 3. Correlation matrices of x abd5 , x pct5 , and x cut5 in the input cos¬ 
mology. For jc abd5 , the peak abundance is weakly correlated between 
bins. 



( 1 


-0.05 

-0.09 

-0.08 

-0.16\ 


-0.05 

1 

-0.05 

-0.01 

-0.12 

abd5 

-0.09 

-0.05 

1 

-0.04 

-0.11 


-0.08 

-0.01 

-0.04 

1 

-0.06 


1-0.16 

-0.12 

-0.11 

-0.06 


i ; 


f 1 

0.62 

0.29 

0.15 

0.11) 



0.62 

1 

0.58 

0.36 

0.25 


pct5 


0.29 

0.58 

1 

0.66 

0.43 



0.15 

0.36 

0.66 

1 

0.59 



lo.n 

0.25 

0.43 

0.59 

1 



f 1 

0.58 

0.31 

0.20 

0.15 



0.58 

1 

0.61 

0.39 

0.28 


cut 5 


0.31 

0.61 

1 

0.65 

0.47 



0.20 

0.39 

0.65 

1 

0.70 



lo.l5 

0.28 

0.47 

0.70 

1 J 



Observable vectors are constructed by the description above 
with the settings of Table 2. This choice of bins and p, is made 
such that the same component from different types of observ¬ 
ables represents about the same information, since the bin center 
of x abd5 roughly correspond to x cutb for the input cosmology n m . 
Following LPL15, who discovered in their study that the bin- 
width choice has a minor impact on parameter constraints if the 
estimated number count in each bin is > 10, we chose not to ex¬ 
plore different choices of binwidths for ,r lh<lb . We also note that 
Pi for x dut5 are logarithmically spaced. 

By construction, the correlation between terms of percentile¬ 
like vectors is much higher than for the case of peak abundance. 
This tendency is shown in Table 3 for the 7r' n cosmology. We 
discovered that jc pct5 and x cut5 are highly correlated, while for 
x ahd5 , the highest absolute value of off-diagonal terms does not 
exceed 17%. A similar result was observed when we binned data 
differently. This suggests that the covariance should be included 
in likelihood analyses. 

2.3. Constraint qualification 

In this paper, both Bayesian inferences and likelihood-ratio tests 
(see, e.g.. Theorem 10.3.3 from Casella & Berger 2002) have 
been performed. To distinguish between these two cases, we call 


credible region the posterior PDF obtained from the Bayesian 
approach, which differs from the confidence region, whose inter¬ 
pretation can be found in Sect. 5.2. 

To quantify parameter-constraint contours, we introduce two 
criteria. Inspired by Jain & Seljak (1997) and Maoli et al. (2001), 
the first criterion is to determine the error on 

Z 8 = cr 8 (Q m /0.27r. (5) 

Since the banana-shaped contour becomes more or less an elon¬ 
gated ellipse in log space, AE 8 represents the “thickness” of the 
banana, tilted by the slope a. Therefore, we first fit a with the 
linear relation log X 8 = logcr 8 + a log(T> m /0.27j, and then cal¬ 
culate the 1-cr interval of Z 8 on the O nl -Z 8 plane. For a Bayesian 
approach, this interval is given by the 68% most probable in¬ 
terval from the marginalized likelihood, while for a frequentist 
approach, significance levels are given by likelihood-ratio tests 
on the marginalized likelihood. Examples of both approaches are 
shown by Fig. 2. Since no real data are used in this study, we are 
not interested in the best fit value, but the 1-cr interval width A2 8 . 

The second indicator is the figure of merit (FoM) for Q m and 
<x 8 , proposed by Dietrich & Hartlap (2010). They define a FoM 
similar to the one from Albrecht et al. (2006) as the inverse of 
the area of the 2-cr region. 

3. Influence of the cosmology-dependent 
covariance 

3.1. Formalism 

In this section, we examine the cosmology-dependent- 
covariance (CDC) effect. From our statistic, we estimate the in¬ 
verse covariance from Eq. (3) for each n from 1000 realizations. 
By setting the Bayesian evidence to unity, P(x = x obs ) = 1, we 
write the relation among prior probability 'P(jr), the likelihood 
£(n) = P(x obs \7i), and posterior probability |x obs ) as 

P{7t\x oh *) = £(n)V( jt). (6) 

Given a model, we write Ax(n) = x moi (jr) - x obs as the dif¬ 
ference between the model prediction x mod and the observation 
x obs . Then the Gaussian log-likelihood is given by 

-2 In £(jc) = In [(litf det(C)] + Ax T C^Ax (7) 

where d denotes the dimension of the observable space, and C is 
the covariance matrix for jc rnod . 

Estimating C as an ensemble average is difficult since cos- 
mologists only have one Universe. One can derive C from ob¬ 
servations with statistical techniques, such as bootstrap or jack¬ 
knife (LPL15), or from a sufficient number of independent fields 
of view from /V-bodv simulations (LPH15) or using analytic cal¬ 
culations. However, the first method only provides the estima¬ 
tion for a specific parameter set C(tt = 7r" hs ); the second method 
is limited to a small amount of parameters owing to the very 
high computational time cost; and the third method involves 
higher order statistics of the observables, which might not be 
well known. Thus, most studies suppose that the covariance ma¬ 
trix is invariant so ignore the CDC effect. In this case, the deter¬ 
minant term becomes a constant, and likelihood analysis can be 
summed up as the minimization of^ 2 = Ax T C~' Ax. 

Alternatively, the stochastic characteristic of our model pro¬ 
vides a quick and simple way to estimate the covariance matrix 
C(n) of each single parameter set n. To examine the impact of 
the CDC effect in the peak-count framework, we write down the 
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fI m -S 8 constraint 



log O m form an approximately linear degenerency, the quantity Eg = <xg(f2 m /0.27)“ allows us to characterize the banana-shape contour thickness. 
Right panel: the marginalized PDF of Eg. The dashed lines give the 1-cr interval (68.3%), while the borders of the shaded areas represent 2-<x 
limits (95.4%). Left panel: the log-value of the marginalized likelihood ratio. Dashed lines in the left panel give the corresponding value for 1 and 
2-cr significance levels, respectively. 


constant-covariance Gaussian (labeled eg), the semi-varying- 
covariance Gaussian (labeled svg), and the varying-covariance 
Gaussian (labeled vg) log-likelihoods as 

L cg s A x T (n) C“V’ bs ) A x(7t), ( 8 ) 

L svg = A x T (n) C~ x {it) Ax(?r), and (9) 

L vg = In |^det C(tt)J + A x T (n) C~ l (7t) Ax(n). (10) 

Here, the term C , (7r ohs ) in Eq. ( 8 ) refers to C 1 (rr 111 ), where n'" is 
described in Sect. 2.2. By comparing the contours derived from 
different likelihoods, we aim to measure ( 1 ) the evolution of the 
X 2 term by substituting the constant matrix with the true vary¬ 
ing C~\ and (2) the impact from adding the determinant term. 
Therefore, L svg is just an illustrative case to assess the influence 
of the two terms in the likelihood. 

3.2. Thex 1 term 

The lefthand panel of Fig. 3 shows the comparison between con¬ 
fidence regions derived from L cg and L svg with x abd . It shows a 
clear difference of the contours between L cg and L svg . Since the 
off-diagonal correlation coefficients are weak (as shown in Ta¬ 
ble 3), the variation in diagonal terms of C plays a major role in 
the size of credible regions. The isolines for C 55 are also drawn 
in Fig. 3. These isolines cross the £2 m -crg degenerency lines from 
L cg and thus shrink the credible region. We also find that the iso¬ 
lines for C 11 and C 22 are noisy and that those for C 33 and C 44 
coincide well with the original degeneracy direction. 

Table 4 shows the values of both criteria for different like¬ 
lihoods. We observe that using L svg significantly improves the 
constraints by 24% in terms of FoM. Regarding AZg, the im¬ 
provement is weak. As a result, using varying covariance ma¬ 
trices breaks down part of the banana-shape degenerency and 
shrinks the contour length, but does not reduce the thickness. 

In the lefthand panels of Fig. 4, we show the same constraints 
derived from two other observables x pcl5 and x cut5 . We see a sim¬ 
ilar CDC effect for both. We observe that x pct5 has less constrain¬ 
ing power than x abd5 , and x cut5 is outperformed by both other 
data vectors. This is due to the cutoff value v m i„. Introducing a 


cutoff at v m i n = 3 decreases the total number of peaks and ampli¬ 
fies the fluctuation of high-peak values in the CDF. When we use 
percentiles to define observables, the distribution of each com¬ 
ponent of x cut5 becomes wider than the one of the corresponding 
component of x pctS , and this greater scatter in the CDF enlarges 
the contours. However, the cutoff also introduces a tilt for the 
contours. Table 5 shows the best fit a for the different cases. 
The difference in the tilt could be a useful tool for improving the 
constraining power. This has also been observed by Dietrich & 
Hartlap (2010). Nevertheless, we do not take on any joint analy¬ 
sis since x abd5 and x cllt5 contain essentially the same information. 


3.3. Impact from the determinant term 

The righthand panel of Fig. 3 shows the comparison between 
L svg and L vg with x abd5 . It shows that adding the determinant 
term does not result in significant changes of the parameter con¬ 
straints. The isolines from ln(detC) explain this, since the gradi¬ 
ents are perpendicular to the degenerency lines. We observe that 
including the determinant makes the contours slightly larger, but 
almost negligibly so. The total improvement in the contour area 
compared to L cg is 22%. 

However, a different change is seen for x pct5 and x cutS . 
Adding the determinant to the likelihood computed from these 
observables induces a shift of contours toward the higher f 2 m 
area. In the case of x cut5 , this shift compensates for the contour 
offset from the varying x 2 term, but does not improve either AZg 
or FoM significantly, as shown in Table 4. As a result, using the 
Gaussian likelihood, the total CDC effect can be summed up as 
an improvement of at least 14% in terms of thickness and 38% 
in terms of area. 

The results from Bayesian inference is very similar to the 
likelihood-ratio test. Thus, we only show their AXg and FoM in 
Table 6 and best fits in Table 7. We recall that a similar analysis 
was done by Eifler et al. (2009) on shear covariances. Our ob¬ 
servations agree with their conclusions: a relatively large impact 
from the x 2 term and negligible change from the determinant 
term. However, the total CDC effect is more significant in the 
peak-count framework than for the power spectrum. 
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Fig. 3. Confidence regions derived from L cg , L svg , and L vg with x abd5 . The solid and dashed lines represent L cg in the left panel and L vg in the right 
panel, while the colored areas are from L svg . The black star stands for 7 r ln and gray areas represent the non-explored parameter space. The dotted 
lines are different isolines, the variance C 55 of the bin with highest S/N in the left panel and ln(detC) for the right panel. The contour area is 
reduced by 22% when taking the CDC effect into account. The parameter-dependent determinant term does not contribute significantly. 


4. Testing the copula transform 

4.1. Formalism 

Consider a multivariate joint distribution P(x\,..., xf). In gen¬ 
eral, P could be far from Gaussian so that imposing a Gaussian 
likelihood could induce biases. The idea of the copula technique 
is to evaluate the likelihood in a new observable space where the 
Gaussian approximation is better. Using a change in variables, 
individual marginalized distributions of P can be approximated 
to Gaussian ones. This is achieved by a series of successive one¬ 
dimensional, axis-wise transformations. The multivariate Gaus- 
sianity of the transformed distribution is not garanteed. How¬ 
ever, in some cases, this transformation tunes the distribution 
and makes it more “Gaussian”, so that evaluating the likelihood 
in the tuned space is more realistic (Benabed et al. 2009; Sato 
et al. 2011). 

From Sklar’s theorem (Sklar 1959), any multivariate distri¬ 
bution P(x 1 ,..., Xd) can be decomposed into the copula density 
multiplied by marginalized distributions. A comprehensible and 
elegant demonstration is given by Rtischendorf (2009). Readers 
are also encouraged to follow Scherrer et al. (2010) for detailed 
physical interpretations and Sato et al. (2011) for a very peda¬ 
gogical derivation of the Gaussian copula transform. 

Consider a c/-dimcnsional distribution P(x), where x = 
(x\,..., xf) is a random vector. We let P, be the marginalized 1- 
point PDF of Xj, and F, the corresponding CDF. Sklar’s theorem 
shows that there is a unique d-dimensional function c defined on 
[0, Y\ d with uniform marginal PDF, such that 


P(x) = c(u)P i(xi) ■ ■ ■ P d (Xd), 


( 11 ) 


where u, = Ffxi). The function c is called the copula density. 
On the other hand, let q, = <lv '(t/,), where <I>, is the CDF of the 
normal distribution with the same means p, and variances erf as 


the laws P,, such that 


®,■(?.■) 





(pi(q')dq'. 


1 

,- exp 

Jl^rf 


(q, - Pi) 2 

2 "? 


( 12 ) 

(13) 


We can then define a new joint PDF P' in the q space that cor¬ 
responds to P in x space, i.e. P'(q) = P(x). The marginal PDF 
and CDF of P' are only (h, and <h,, respectively. Thus, applying 
Eq. (11) to P' and <p t leads to 


P\q) = c(u)(pftq l )---<t>d(qd)- 


(14) 


By the uniqueness of the copula density, c in Eqs. (11) and (14) 
are the same. Thus, we obtain 


P(x) = P\q ) 


Pl(Xi)---P d (Xd) 
fM\) • ■ • Mqd) ' 


(15) 


We note that the marginal PDFs of P' are identical to a multi¬ 
variate Gaussian distribution f with mean p and covariance C, 
where C is the covariance matrix of x. The PDF of <f> is given by 


(p(q) = 


rf(27t) d detC 


exp 


1 1 

't ~Pi) c ij (qj-pj) 


(16) 


Finally, by approximating P' to <p, one gets the Gaussian copula 
transform: 


P(x) = tpiq) 


P\(xf) ■ ■ ■ Pd(x d ) 

<Pi(qi) ■■■Mqd)’ 


(17) 


Why is it more accurate to calculate the likelihood in this 
way? In the classical case, since the shape of P(x) is unknown, 
we approximate it to a normal distribution: P(x) ~ <p(x). Apply¬ 
ing the Gaussian copula transform means that we carry out this 
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Fig. 4. Similar to Fig. 3. Confidence regions with jc pct5 and x cut5 . Both upper panels are drawn with jc pct5 and both lower panels with x cut5 . Both 
left panels are the comparison between L cg and L svg , and both right panels between L svg and L vg . 


approximation in the new space of q: P'iq) ~ (f>iq ). Since qi = 
<t ) J l (Fi(xi)), at least the marginals of P'iq) are strictly Gaussian. 
And Eq. (17) gives the corresponding value in x space, while tak¬ 
ing P'iq ) * <piq) in q space. However, in some cases, the copula 
has no effect at all. We consider fix, if) = Icfiiix, y)®ixy) where 
<pi is the two-dimensional standard normal distribution, and 0 is 
the Heaviside step function. The value of / is two times fo if x 
and y have the same sign and 0 otherwise. The marginal PDF of 
/ and 02 turn out to be the same. As a result, the Gaussian copula 
transform does nothing and / remains extremely non-Gaussian. 
However, if we do not have any prior knowledge, then the result 
with the copula transformation should be at least as good as the 
classical likelihood. 


By applying Eq. (17) to P(jc obs |7r), one gets the copula likeli¬ 
hood: 


-Lin) = 


V(2^det C 


x exp 


1=1 7=1 







(18) 


In this paper, //,■ = .v|." od . Including the dependency on n for all 
relevant quantities, the varying-covariance copula log-likelihood 
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L vc is given by 
L vc = In |det C(tt)] 


d d 


+ XZ - *r°V)) c b 1(jr) (^ bS(jr) - x " od(;r) ) 
<=1 1=1 


“ 2 X ln c>, ( 7r ) “ X 

1=1 i= 

d 

-2^1nP,U- bs W. 


d / _obs 


^ bs (^-xf^r) 

o-iW 


(19) 


Here, P,(-|7r) is the i-th marginal ^-dependent PDF that we esti¬ 
mate directly from the N samples xf\ k = 1... At already men¬ 
tioned in Sect. 2.2, using the kernel density estimation (KDE): 


Pi(xd = 



( 20 ) 


where the kernel K is Gaussian, and the bandwidth h, = 
(4/3At)0 5 d-,- is given by Silverman’s rule (Silverman 1986). 
These are one-dimensional PDF estimations, and the time cost 
is almost negligible. The term P,(x° bs |7r) should be understood 
as a one-point evaluation of this function at x° bs . The quantities 
x” lod (jr), &i(n), and CT 1 (n) are estimated with the same set fol¬ 
lowing Eqs. (1), (2), and (3). Finally, q° bs (n) = <&J l (Fj(x° hs \n)). 
We highlight that ,F;(-|jt) is the CDF that corresponds to Pj(-\n), 
and O, also depends on n via p, and cr,-. 

We are also interested in studying the copula transform un¬ 
der the constant-covariance situation. In this case, we define the 
constant-covariance copula log likelihood L cc as 


d d _ 

Uc ^ X X (^ bS(?r) - *rV)) c b 1(7r ‘ n) (^ bS(jr) - ^ od(?r) ) 

1=1 j=\ 

(q ob \n) - x moi {n)^ 

I “ Z 

1=1 


■Z 




2 2 >G 


v °bs _ mod 

A; A. 


(tt)) ■ 


( 21 ) 


Besides the constant covariance, we also suppose that the distri¬ 
bution of each x, around its mean value does not vary with tc. 
In this case, P,(-) denotes the zero-mean marginal PDF, and it is 
only estimated once from the 1000 realizations of n"\ as are dy 

and Cr. 1 . We recall that q° bs (n ) = Or 1 (F , ,(x° bs - x™ od (7r))) where 
<I>, depends on n implicitly via p, and cr,. 


4.2. Constraints using the copula 

We again use the setting described in Sect. 2.2. We outline two 
interesting comparisons, which are shown in Fig. 5: between L cg 
and L cc in the lefthand panel and between L cc and L vc in the 
righthand one, both with x ,lbd \ The lefthand panel shows that, for 
weak-lensing peak counts, the Gaussian likelihood is a very good 
approximation. Quantitative results, shown in Tables 4, 5, 6, and 
7, reveal that the Gaussian likelihood provides slightly optimistic 
Q m -cr8 constraints. We would like to emphasize that the effect 
of the copula transform is ambiguous, and both tighter or wider 
constraints are possible. This has already shown by Sato et al. 
(2011), who found that the Gaussian likelihood underestimates 


Table 4. AE 8 , the error on the parameter (5) and the figure of merit 
(FoM) for confidence regions are summarized for the different analysis 
approaches performed in this paper. L cg , L svg , and L vg are introduced in 
Sect. 3, L cc and L vc in Sect. 4, and Ltme and p-value in Sect. 5. In each 
case, we take x abd5 , x pct5 , or x cut5 as data vector as indicated in the table 
rows. 



^- a bd5 

j£pet5 

x cut5 


AS s 

FoM 

ASg 

FoM 

m 

FoM 

T cg 

0.032 

46 

0.037 

31 

0.065 

13 

-^svg 

0.031 

57 

0.032 

42 

0.054 

21 

T V g 

0.031 

56 

0.032 

43 

0.052 

18 

^CC 

0.032 

43 

0.038 

33 

0.056 

13 

^VC 

0.033 

52 

0.034 

39 

0.058 

16 

^true 

0.033 

54 

0.035 

39 

0.058 

17 

/;-value 

0.035 

39 

0.037 

27 

0.067 

12 


the constraint power for low t of the lensing power spectrum and 
overestimates it for high l. 

In the righthand panel of Fig. 5, when the CDC effect is taken 
into account for the copula transform, the parameter constrains 
are submitted to a similar change to the Gaussian likelihood. 
Tighter constraints are obtained from L vc than from L cc . Similar 
results can be found for x pct5 and x cut5 . In summary, the copula 
with varying covariance, L vc results in an FoM improvement of 
at least 10% compared to the Gaussian case with constant co- 
variance, L cg . 


5. Non-analytic likelihood analyses 

5. 1. The true likelihood 

In this section, we obtain the parameter constraints in a more 
direct way. Since our model predictions sample the full PDF, the 
PDF-Gaussianity assumption is no longer necessary. This allows 
us to go back to the true definition of the log-likelihood: 

= -21nP(* obs |7r), (22) 


where P is estimated from our N realizations x <k) (7r-dependent) 
using the kernel density estimation technique. The multivariate 
estimation is performed by 


P(X) = l Tj K( x ~ x(k)) ’ 

^ k=\ 


K(x) = 


7(271-)^ dettfl 


exp 


1 

'r 


- -x T H *x 


where 



4 

• [(d+ 2)N 
0 


i 

d +4 

<Ti 


if i = j , 
otherwise. 


(23) 

(24) 


(25) 


The evaluation of this non-analytic likelihood gets very noisy 
when the observable dimension d increases. In this case, a larger 
N will be required to stabilize the constraints. As in previous 
sections, we perform both the likelihood-ratio test and Bayesian 
inference with this likelihood. 
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Fig. 5. Confidence regions derived from copula analyses. Left panel, comparison between contours from L cg (solid and dashed lines) and L cc 
(colored areas). Right panel: comparison between contours from L cc (colored areas) and L vc (solid and dashed lines). The evolution tendency from 
L cc to L vc is similar to the evolution from L cg to L vg . 

Table 5. Best fits of (.S 8 , o-) from all analyses using the likelihood-ratio test and p-value analysis (confidence region). The description of L cg , L svg , 
and L vg can be found in Sect. 3, L cc and L cg in Sect. 4, and L true and p-value in Sect. 5. We note that the best fits for Eg are indicative since we do 
not use the real observational data in this study. 



j^abdS 

^pct5 

^cut5 


V +lcr 
^8 -lo- 

a 

Y +1 cr 

L& -lo- 

a 

Y +1<t 
^8 -lcr 

a 

T C g 

0-831“ 

0.54 

0 099 +O.OI 8 
u.oz,z,_ 0 ( )i9 

0.54 

0.800!°;“° 

0.45 

Lsvg 

0.831!°;™ 

0.52 

0-820!°;™ 

0.51 

0.800!°;™ 

0.40 

L vg 

0-829!°;™ 

0.52 

0-819!°;™ 

0.52 

0.800!°;™ 

0.42 

^CC 

0.830!°;™ 

0.54 

0 QO ^+0.018 

w.oz,j _o.020 

0.54 

0.807!°;™ 

0.46 

-^vc 

0-829!°;™ 

0.52 

0-823!°;™ 

0.53 

0.798!°;™ 

0.44 

^true 

0.828!°;™ 

0.53 

0 Q99+0-015 

u.oz,d_oo20 

0.53 

0.800!°;™ 

0.44 

p-value 

0 Q9^ + 0d)16 
u.ojj 0 .019 

0.54 

H 099 + 0.018 

U.oZ J o 018 

0.54 

0.798!°;™ 

0.45 


Table 6. Similar to Table 4, but for credible regions. The description of 
ABC can be found in Sect. 6. 



^abd5 

^pct5 

x eut5 


ae 8 

FoM 

ax 8 

FoM 

ae 8 

FoM 

Leg 

0.033 

43 

0.038 

31 

0.066 

15 

-^SVg 

0.031 

53 

0.033 

41 

0.056 

20 

Lvg 

0.031 

53 

0.032 

40 

0.055 

18 

^CC 

0.033 

40 

0.040 

30 

0.071 

14 

^VC 

0.033 

47 

0.035 

36 

0.060 

16 

^true 

0.034 

49 

0.036 

36 

0.061 

17 

ABC 

0.056 

31 

0.044 

33 

0.068 

16 


5.2. p-value analysis 

Another non-analytic technique is the p-value analysis. This fre- 
quentist approach provides the significance level by directly de¬ 
termining the p-value associated with a observation x obs . The 


p-value is defined as 

p=l-J d d x P(x\7t) x © (P(x\7t) - P(x obs \7t )), (26) 

where 0 denotes the Heaviside step function. The integral ex¬ 
tends over the region where x is more probable than x obs for a 
given 7 r, as shown by Fig. 6 . Thus, the interpretation of Eq. (26) 
is that if we generated N universes, then at least (1 - p)N of 
them should have an observational result “better” than x obs . In 
this context, "better" refers to a more probable realization. The 
significance level is determined by the chi-squared distribution 
with d — 2 degree of freedom, for two free parameters, Q m and 
<x 8 . As Fig. 7 shows, this provides a straightforward way to dis¬ 
tinguish different cosmological models. 

As in Sect. 4.2, we used KDE to estimate the multivari¬ 
ate PDF and numerically integrated Eq. (26) to obtain the p- 
value. Monte Carlo integration is used for evaluating the five¬ 
dimensional integrals. 
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Table 7. Similar to Table 5, but for Bayesian inference (credible region). The description of ABC can be found in Sect. 6. We note that the best 
fits for Eg are indicative since we do not use the real observational data in this study. 



Jt a bd5 

x pct5 

x cut5 


V + lcr 
-lcr 

a 

V +lcr 
-lcr 

a 

V + lcr 
-lcr 

a 

L cg 

0-831" 

0.54 

0-822" 

0.54 

0-800" 

0.45 

L S vg 

0-831" 

0.52 

0-820t°i b 

0.51 

0-800" 

0.40 

Lyg 

0-829" 

0.52 

0 819 +0015 

1 -0.017 

0.52 

0-800" 

0.42 

^CC 

O 

OO 

u> 

o 

1 + 
o o 

b b 

-j ^3 

0.54 

0 Q9S+0.018 
w,oz ' -0.022 

0.54 

0-807" 

0.46 

-^VC 

0-829" 

0.52 

0-823" 

0.53 

0.798" 

0.44 

^true 

0-828" 

0.53 

0-823" 

0.53 

0-800" 

0.44 

ABC 

0-819" 

0.50 

a Qi '7+0.022 
-0.022 

0.51 

0 7QQ + ° 034 
' ^-0.034 

0.42 




Fig. 8. Left panel: confidence regions derived from L vc (colored areas) and (solid and dashed lines) with x abd5 . Right panel: confidence regions 
derived from L vc (colored areas) and p-value analysis (solid and dashed lines). The contours from L, rlle and p-value analysis are noisy due to a 
relatively low N for probability estimation. We notice that L vc and L true yield very similar results. 


5.3. Parameter constraints 

Figure 8 shows the confidence contours from L tlue and p-value 
analysis with observables * abd5 . We notice that these constraints 
are very noisy. This is due to a relatively low number of real¬ 
izations to estimate the probability and prevents us from mak¬ 
ing definite conclusions. Nevertheless, the result from the left- 
hand panel reveals good agreement between constraints from 
two likelihoods. This suggests that we may substitute L true with 
the CDC-copula likelihood to bypass the drawback of noisy esti¬ 
mation from Ltrue- In the righthand panel, the result from p-value 
analysis seems to be larger. We reduced the noise for the p-value 
analysis by combining x abds into a two-component vector. In this 
case, the p-value is evaluated using the grid integration. This data 
compression technique does not significantly enlarge but visibly 
smooths the contours. 

In the L tlue case, the probability information that we need 
is local since the likelihood P is only evaluated at jt ohs . For p- 
value analysis, one needs to determine the region where P(x) < 
P(x° bs ) and integrate over it, so a more global knowledge of Pix) 
is needed in this case. We recall that KDE smooths, thus the 


estimation is always biased (Zambom & Dias 2012). Other es¬ 
timations, for example using the Voronoi-Delaunay tessellation 
(see, e.g., Schaap 2007; Guio & Achilleos 2009), could be an 
alternative to the KDE technique. As a result, observable choice, 
data compression, and density estimation need to be considered 
jointly for all non-analytic approaches. 

Recent results from CFHTLenS (LPH15) and Stripe-82 
(LPL15) resulted in AEg ~ 0.1, about 2-3 times larger than 
this study. However, we would like to highlight that redshift 
errors are not taken into account here and that the simulated 
galaxy density used in this work is much higher. Also, we choose 
Z s = 1, which is higher than the median redshift of both surveys 
(~ 0.75). All these factors contribute to our smaller error bars. 


6. Approximate Bayesian computation 

6.1. PMC ABC algorithm 

In the previous section, we presented parameter constraints de¬ 
rived from directly evaluating the underlying PDF. Now, we want 


Article number, page 10 of 15 































































C.-A. Lin & M. Kilbinger: A new model to predict weak-lensing peak counts II. 



Fig. 6. Example for the p-value determination. The x-axis indicates a 
one-dimensional observable, and the y -axis is the PDF. The PDF is ob¬ 
tained from a kernel density estimation using the N = 1000 realizations. 
Their values are shown as bars in the rug plot at the top of the panel. 
The shaded area is the corresponding p-value for given observational 
data x obs . 



Observable x 


Fig. 7. PDF from two different models (denoted by and n 2 ) and the 
observation x obs . The dashed lines show 1-cr intervals for both models, 
while the shaded areas are intervals beyond the 2-<x level. In this figure, 
model 7Ti is excluded at more than 2 -a, whereas the significance of the 
model n 2 is between 1 and 2-cr. 

to move a step further and bypass the likelihood estimation alto¬ 
gether. 

Based on an accept-reject rule, approximate Bayesian com¬ 
putation (ABC) is an algorithm that provides an approximate 
posterior distribution of a complex stochastic process when eval¬ 
uating the likelihood is expensive or unreachable. There are only 
two requirements: (1) a stochastic model for the observed data 
that samples the likelihood function of the observable and (2) a 
measure, called summary statistic, to perform model compari¬ 
son. We present below a brief description of ABC. Readers can 
find detailed reviews of ABC in Marin et al. (2011) and Sect. 1 
of Cameron & Pettitt (2012). 

The idea behind ABC can be most easily illustrated in the 
case of discrete data as follows. Instead of explicitly calculat¬ 
ing the likelihood, one first generates a set of parameters { 717 } 
as samples under the prior Pin), and then for each n, simu¬ 
lates a model prediction X sampled under the likelihood function 
P(-\ni). (Here we put X in the upper case to emphasize that X is 
a random variable.) Keeping only those n, for which X = x obs , 
the distribution of the accepted samples T’abgW equals the pos¬ 
terior distribution of the parameter P{n\x° bs ) given the observed 


data, since 

P abc(tt) = 'Yj p ( x \n)'P(n)6x,x°* 

X 

= P(x obs \n)P(n) 

= P(n\x obs ), (27) 

where 6x, x abs is Kronecker’s delta. Therefore, {zr ; } is an indepen¬ 
dent and identically distributed sample from the posterior. It is 
sufficient to perform a one-sample test : using a single realization 
X for each parameter n to obtain a sample under the posterior. 

ABC can also be adapted to continuous data and parameters, 
where obtaining a strict equality X = x obs is pratically impos¬ 
sible. As a result, sampled points are accepted with a tolerance 
level e, say \X - x obs | < e. What is retained after repeating this 
process is an ensemble of parameters n that are compatible with 
the data and that follow a probability distribution, which is a 
modified version of Eq. (27), 

PMx obs ) = A e in)Pin ), (28) 

where AJn) is the probability that a proposed parameter n passes 
the one-sample test within the error e. 

A € (n) = Jd*P(Z|;r)l|^ob S |< e (X). (29) 

The Kronecker delta from Eq. (27) has now been replaced with 
the indicator function 1 of the set of points X that satisfy the tol¬ 
erance criterion. The basic assumption of ABC is that the proba¬ 
bility distribution (28) is a good approximation of the underlying 
posterior, such that 

!P e (jr|jc obs ) » P(n\x obs ). (30) 

Therefore, the error can be seperated into two parts: one from 
the approximation above and the other from the estimation of the 
desired integral, P e . For the latter, gathering one-sample tests of 
A r makes a Monte Carlo estimation of P e , which is unbiased. 
This ensures the use of the one-sample test. 

A further addition to the ABC algorithm is a reduction in the 
complexity of the full model and data. This is necessary in cases 
of very large dimensions, for example, when the model produces 
entire maps or large catalogs. The reduction of data complexity 
is done with the so-called summary statistic s. For instance, in 
our peak-count framework, a complete data set x is a peak cat¬ 
alog with positions and S/N values, and the summary statistic s 
is chosen here to be s(x) = x abd5 , x pct5 , or x CL " 5 , respectively, for 
the three cases of observables introduced in Sect. 2.2. As a re¬ 
mark, if this summary statistic is indeed sufficient, then Eq. (30) 
will no longer be an approximation. The true posterior can be 
recovered when e —> 0. 

For a general comparison of model and data, one chooses a 
metric D adapted to the summary statistic ,v, and the schematic 
expression \X - x obs | < e used above is generalized to 
D(s(X), ,v(jf obs )) < e. We highlight that the summary statistic can 
have a low dimension and a very simple form. In practice, it is 
motivated by computational efficiency, and it seems that a sim¬ 
ple summary can still produce reliable constraints (Weyant et al. 
2013). 

The integral of Eq. (28) over n is smaller than unity, and 
the deficit only represents the probability that a parameter is re¬ 
jected by ABC. This is not problematic since density estimation 
will automatically normalize the total integral over the posterior. 
However, a more subtle issue is the choice of the tolerance level 
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Algorithm 1 Population Monte Carlo approximate Bayesian 
computation 

Require: 

number of particles p 
prior p(-) 

summary statistic ,v(-j 
distance D(-, •) 
shutoff parameter r stop 


set t — 0 

for i = 1 to p do 

generate i r| 0) from p(-) and x from pdf (jjrj 01 ) 

set 6 J 0) = s(x obs )) and = \/p 

end for 

set e <l> = median (<5; 0) ) and C (0) = cov ( 


while success rate > r stop do 
t <- t + 1 
for i = 1 to p do 

repeat 

generate j from {1 ,...,p] with weights {ujj ,..., 


generate tt® from N (n ( j 1 \C (t and jc from pdf (-|7r®) 

set b® = D (s(x), s(jt obs )) 
until <5® < e (r) 

m P\ 

set w. oc 


’(-?) 


Z>„i -np n ,(<'- n ) 

end for 

set e ( ' +1) = median (b®) and C® = cov (n'P, w®) 

end while 


e. If e is too high, A(tt) is close to 1, and Eq. (30) becomes a 
bad estimate. If e is too low, A(n) is close to 0, and sampling be¬ 
comes extremely difficult and inefficient. How, then, should one 
choose el This can be done by applying the iterative importance 
sampling approach of population Monte Carlo (PMC; for appli¬ 
cations to cosmology, see Wraith et al. 2009) and combine it 
with ABC (Del Moral et al. 2006; Sisson et al. 2007). This joint 
approach is sometimes called SMC ABC, where SMC stands for 
sequential Monte Carlo; we refer to it as PMC ABC. The idea of 
PMC ABC is to iteratively reduce the tolerance e until a stopping 
criterion is reached. 

Algorithm 1 details the steps for PMC ABC. We let pdf(;t|jr) 
be a probabilistic model for x given n. PMC ABC requires a 
prior p(ir), a summary statistic six) that retains only partial in¬ 
formation about x, a distance function D based on the dimen¬ 
sion of s(x), and a shutoff parameter r stop . We denote N(i r, C) 
as a multivariate normal with mean n and covariance matrix 
C, K(n,C) k exp (—n T C~ 1 tt/ 2^ a Gaussian kernel, covin,, w,) 
the weighted covariance for the set {n\,... ,n p ] with weights 
{uq,..., w p ), and p the number of particles, i.e., the number of 
sample points in the parameter space. 

In the initial step, PMC ABC accepts all particles drawn from 
the prior and defines an acceptance tolerance before starting the 
first iteration. The tolerance is given by the median of the dis¬ 
tances of the summary statistic between the observation and the 
stochastic model generated from each particle. Then, each iter¬ 
ation is carried out by an importance-sampling step based on 
weights determined by the previous iteration. To find a new par¬ 


ticle, a previous point n'- 11 is selected according to its weight. A 
candidate particle is drawn from a proposal distribution, which 
is a normal law centered on n’- 11 with a covariance equal to 
the covariance of all particles from the previous iteration. With 
a model generated using the candidate particle, we accept the 
new particle if the distance between the model and the observa¬ 
tion is shorter than the tolerance, and reject it otherwise. After 
accepting p particles, the success rate, defined as the ratio of ac¬ 
cepted particles to total tries, is updated. The iterations continue 
until the success rate decreases below the shutoff value. Instead 
of defining a minimal tolerance (Beaumont et al. 2002; Fearn- 
head & Prangle 2010; Weyant et al. 2013), we use a simplified 
stopping criterion that is based on the selection efficiency of the 
algorithm. Since McKinley et al. (2009) prove that the stopping 
criterion has very little impact on the estimated posterior, the 
choice of the tolerance level is instead a question of computa¬ 
tional power. 

6.2. Settings for ABC 

McKinley et al. (2009) studied the impact of the various choices 
necessary for ABC, by comparing Markov chain Monte Carlo 
(MCMC) ABC and PMC ABC. The authors concluded that (1) 
increasing the number of simulations beyond one for a given 
parameter does not seem to improve the posterior estimation 
(similar conclusion found by Bornn et al. 2014), (2) the spe¬ 
cific choice of the tolerance level does not seem to be impor¬ 
tant, (3) the choice of the summary statistic and the distance is 
crucial, and (4) PMC ABC performs better than MCMC ABC. 
Therefore, exploring a sufficient summary statistic to represent 
the whole data set becomes an essential concern for the ABC 
technique. 

To solve the optimal data compression problem, Blum et al. 
(2013) provide a series of methods in their review for selecting 
the ideal summaries, as well as methods of reducing the data 
dimension. Leaving a detailed study of optimal choice for the 
future work, we adopted a straightforward summary statistic in 
this work, defined as s(jc) = x type5 , and the distance as 



(31) 


This is simply a weighted Euclidean distance, where the weight 
C~1 is needed to level out the values of the different S/N. 

The prior p is chosen to be flat. We set p = 250 and r stop = 
0.02. In this condition, we can easily compare the computational 
cost with the analyses presented in previous sections. If r is the 
time cost for one model realization, the total time consumption 
is 7821 x 1000 x r for our likelihood-based analyses in Sects. 
3, 4, and 5, and 250 x 1 x (£, r~ l ) x r for ABC where r, is the 
acceptance rate of the f-th iteration. For :t abd5 , 2/ 1 ~ 102 (0 < 

t < 9), so the computation time for ABC is drastically reduced 
by a factor of ~ 300 compared to the likelihood analyses. ABC 
is faster by a similar factor compared to Monte-Carlo sampling, 
since typically the number of required sample points is <9(10 4 ), 
the same order of magnitude as our number of grid points. 


6.3. Results from ABC 

Figure 9 shows the iterative evolution of the PMC ABC parti¬ 
cles. We drew the position of all 250 particles and credible re¬ 
gions for the first eight iterations. The summary statistic is Jt abd5 . 
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Fig. 9. Evolution of particles and the posterior from the PMC ABC algorithm. We show results from the first 8 iterations (0 < t < 1). Particles are 
given by blue dots. Solid lines are l-<x contours, and dashed lines are 2-cr contours. White areas represent the prior. The corresponding accept rate 
r and tolerance level e are also given. We set e <0) = +oo. 


The credible regions were drawn from the posterior estimated on 
a grid using KDE with the ABC particles as sample points x (k> 
in Eq. (23). We ignored the particle weights for this density es¬ 
timate. We find that the contours stablize for 1 > 8, which corre- 
ponds to an acceptance rate of r — 0.05. At these low accpetence 
rates, corresponding to a small tolerance, the probability of sat¬ 
isfying the tolerance criterion D(s(X), s(jc obs )) < e is low even 
though X is sampled from parameters in the high-probability re¬ 
gion, and accepting a proposed particle depends mainly on ran¬ 
dom fluctuations due to the stochasticity of the model. 

In Fig. 10, we show the weights of particles sampled at the 
final iteration t = 8. The weight is visualized by both color and 
size of the circle. The figure shows that points farther away from 
the maximum have larger weights as constructed by Algorithm 
1. Since those points are more isolated, their weights compensate 
for the low density of points, avoiding undersampling the tails of 
the posterior and overestimating the constraining power. 

In Fig. 11 we show the comparison of credible regions be¬ 
tween L cg and PMC ABC with x(x) = jc abd5 . The FoM and the 
best-fit ABC results are presented in Tables 6 and 7, respectively. 
The figure shows good agreement between the two cases, and 
thus validates the performance of PMC ABC. The broader con¬ 
tours from ABC might be caused by a bias of KDE. The same 
reason might be responsible for the slight shift of the contours 


in the tails of the distribution, which do not follow the particles 
exactly, which are best visible in the two lefthand panels in the 
lower row of Fig. 9. 

7. Summary and discussion 

Our model for weak-lensing peak counts, which provides a direct 
estimation of the underlying PDF of observables, leads to a wide 
range of possibilities for constraining parameters. To summarize 
this work, we 

- compared different data vector choices, 

- studied the dependence of the likelihood on cosmology, 

- explored the full PDF information of observables, 

- proposed different constraint strategies, and 

- examined them with two criteria. 

In this paper, we performed three different series of 
analyses-the Gaussian likelihood, the copula likelihood, and 
non-analytic analyses-by using three different data vectors: one 
based on the peak PDF and two on the CDF. We defined two 
quantitative criteria: ASg, which represents the error bar on the 
parameter 2g = cr%(Q. m /0.21) a and is a measure of the width of 
the O m -crg degeneracy direction; and FoM, which is the area of 
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Fig. 10. Weights of particles from t = 8 with s(x) = jc abd5 . The weight 
is represented by the size and the color at the same time. 
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Fig. 11. Comparison between credible regions derived from L cg (col¬ 
ored areas) and ABC (solid and dashed lines). 


the O m -crg contour. Both Bayesian and frequentist approaches 
were followed. Although the interpretations are different, the re¬ 
sults are very similar. 

We studied the cosmology-dependent-covariance (CDC) ef¬ 
fect by estimating the true covariance for each parameter set. We 
found that the CDC effect can increase the constraining power up 
to 22%. The main contribution comes from the additional vari¬ 
ation of the x 2 term, and the contribution from the determinant 
term is negligible. These observations confirm a previous study 
by Eifler et al. (2009). 

We also performed a copula analysis, which makes weaker 
assumptions than Gaussianity. In this case, the marginalized PDF 


is Gaussianized by the copula transform. The result shows that 
the difference with the Gaussian likelihood is small. This is dom¬ 
inated by the CDC effect if a varying covariance is taken into 
account. 

Discarding the Gaussian hypothesis on the PDF of observ¬ 
ables, we provided two straightforward ways of using the full 
PDF information. The first one is the true likelihood. The direct 
evaluation of the likelihood is noisy owing to the high statistical 
fluctuations from the finite number of sample points. However, 
we find that the varying-covariance copula likelihood, noted as 
L vc above, seems to be a good approximation to the truth. The 
second method is to determine the p-value for a given param¬ 
eter set directly, and this approach gives us more conservative 
constraints. We outline that both methods are covariance-free, 
avoiding non-linear effects caused by the covariance inversion. 

At the end we showed how approximate Bayesian computa¬ 
tion (ABC) derives cosmological constraints using the accept- 
reject sampling. Combined with importance sampling, this 
method requires less computational resources than all the others. 
We proved that by reducing the computational time by a factor 
of 300, ABC is able to yield consistent constraints from weak- 
lensing peak counts. Furthermore, Weyant et al. (2013) show in 
their study that ABC is able to perform unbiased constraints us¬ 
ing contaminated data, demonstrating the robustness of this al¬ 
gorithm. 

A comparison between different data vectors is done in this 
study. Although we find for all analyses that jc abd5 outperforms 
x pct5 by 20%^10% in terms of FoM, this is not necessarily true 
in general when we use a different percentile choice. Actually, 
the performance of x pcl depends on the correlation between its 
different components. However, the * pct family is not recom¬ 
mended in practice because of model biases induced for very 
low peaks (S/N < 0). In addition, our study shows that the x cut 
family is largely outperformed by x' lhd . Thus, we conclude that 
* abd seems to be good candidates for peak-count analysis, while 
the change in the contour tilt from x cl " could be interesting when 
combining with other information. 

The methodology that we show for parameter constraints can 
be applied to all fast stochastic forward models. Flexible and ef¬ 
ficient, this approach possesses a great potential whenever the 
modeling of complex effects is desired. Our study displays two 
different parameter-constraint philosophies. On the one hand, 
parameteric estimation (Sects. 3 and 4), under some specific 
hypotheses such as Gaussianity, only requires some statistical 
quantities such as the covariances. However, the appropriateness 
of the likelihood should be examined and validated to avoid bi¬ 
ases. On the other hand, non-analytic estimation (Sects. 5 and 6) 
is directly derived from the PDF. The problem of inappropriate¬ 
ness vanishes, but instead the uncertainty and bias of density esti¬ 
mation become drawbacks. Depending on modeling pertinence, 
an aspect may be more advantageous than another. Although not 
studied in this work, a hybrid approach using semi-analytic es¬ 
timator could be interesting. This solicits more detailed studies 
of trade-off between the inappropriatenss of analytic estimators 
and the uncertainty of density estimation. 
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